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1 Introduction 

The general applicative framework of the ISIS 
project^ [ini was to design a NLP interface 
for automated telephone-based phone-book in- 
quiry. The objective of the project was to de- 
fine an architecture to improve speech recogni- 
tion results by integrating higher level linguis- 
tic knowledge. The availability of a huge collec- 
tion of annotated telephone calls for querying 
the Swiss phone-book database (i.e the Swiss 
French PolyPhone corpus) allowed us to pro- 
pose and evaluate a very first functional proto- 
type of software architecture for vocal access to 
database through phone and to test our recent 
findings in semantic robust analysis obtained 
in the context of the Swiss National Fund re- 
search project ROTA (Robust Text Analysis). 

One of the main issues which has been taken 
into consideration is about robustness. Robust- 
ness in dialogue is crucial when the artificial 
system takes part in the interaction since in- 



^ISIS project started on April 1998 and finished on 
April 1999. It was funded and overseen by Swiss- 
Corn; the partners were EPFL (LI A and LITH), 
ISSCO and IDIAP. More information can be found at 
lithwww. epf 1 . ch/'pallotta/rapportf inal .ps . gz 



ability or low performance in processing utter- 
ances will cause unacceptable degradation of 
the overall system. As pointed out in [2] it 
is often better to have a dialogue system that 
tries to guess a specific interpretation in case of 
ambiguity rather than ask the user for a clari- 
fication. If this first commitment results later 
have to been a mistake, a robust behavior will 
be able to interpret subsequent corrections as 
repair procedures to be issued in order to get 
the intended interpretation. 

1.1 ISIS architecture 

Dialogue processing requires large amount of 
domain knowledge as well as linguistic knowl- 
edge in order to ensure acceptable coverage 
and understanding. Cooperation between pro- 
cessing modules and the integration of vari- 
ous knowledge resources require the design of 
a suitable software architecture. In the ISIS 
project the processing of the corpus data is 
performed at various linguistic levels by mod- 
ules organized into a pipeline. Each module 
assumes as input the output of the preceding 
module. The main goal of this architecture is 
to understand how far it is possible go with- 
out using any kind of feedback among different 



linguistic modules. In this paper we will detail 
the functionality of the semantic module. 

2 Robust semantic analysis 

In theory, a complete dialogue management 
system requires total semantic understanding 
of the input. However, as we all know this 
is not possible with current systems, and may 
never be possible. Even restricting ourselves to 
a limited domain it is still very difficult to get 
any useful semantic representation from free di- 
alogue. 

A different approach considers that a dialogue 
management can be achieved by a light parsing 
of the input. This method needs neither a full 
semantic understanding of the language nor a 
deep investigation in the meaning and senses of 
the words. It is merely based on the knowledge 
of certain cue-phrases able to describe a shal- 
low semantic structure of the text. These cue- 
phrases or terms should be relevant enough to 
give a coherent semantic separation of the dif- 
ferent parts. Nonetheless, the set of terms must 
not be rigid to avoid boolean results, but there 
must be a set of semantically similar terms with 
a degree of confidence for each. This would 
generate hypothetic semantic descriptions. In 
fact, these terms which correspond to semantic 
fields, are able to isolate texts parts. In case of 
a failure in obtaining a full and precise seman- 
tic description, a minimum description would 
indeed be derived. Therefore in all cases only 
relevant parts would undergo the understand- 
ing process. A similar approach has been pro- 
posed by Grefenstette in [TT| where the main 
applications are slanted to the extraction of 
syntactic information (e.g. grouping adjacent 
syntactically-related units and extracting non- 
adjacent n-ary grammatical relations). Finite- 
states parsing technology has been adopted as 
a solution in order to achieve robustness and 
efhciency at the implementation level. 



Although robustness can be considered as be- 
ing applied at either a syntactic or semantic 
level, we believe it is generally at the seman- 
tic level that it is most effective. This robust 
analysis needs a model of the domain in which 
the system operates, and a way of linking this 
model to the lexicon used by the other com- 
ponents. It specifies semantic constraints that 
apply in the world and which allow us, for in- 
stance, to rule out incoherent requests. The 
degree of detail required of the domain model 
used by the robust analyzer depends upon the 
ultimate task that must be performed: in our 
case, furnishing a query to an information sys- 
tem. The use of domain knowledge has turned 
out to be crucial since our particular goal is to 
process a queries without any request of clarifi- 
cation from the system. Due to the inaccuracy 
and ambiguity generated by previous phases of 
analysis we need to select the best hypotheses 
and often recover information lost during that 
selection. There are several ways of integrating 
lexical resources (e.g. dictionaries, thesauri) 
and knowledge bases or ontologies at different 
levels of dialogue processing. 

2.1 Robust Definite Clause Gram- 
mars 

LHIP (Left-corner Head-driven Island Parser) 
[HjfTl] is a system which performs robust analy- 
sis of its input, using a grammar defined in an 
extended form of the Definite Clause Gram- 
mar (DCGs) formalism used for implementa- 
tion of parsers in Prolog. LHIP employs a dif- 
ferent control strategy from that used by Pro- 
log DCGs, in order to allow it to cope with 
ungrammatical or unforeseen input. A num- 
ber of tools are provided for producing analy- 
ses of input by the grammar with certain con- 
straints. For example, to find the set of anal- 
yses that provide maximal coverage over the 
input, to find the subset of the maximal cover- 
age set that have minimum spans, and to find 



the analyses that have maximal thresholds. In 
addition, other tools can be used to search the 
chart for constituents that have been found but 
are not attached to any complete analysis. 

Weighted LHIP rules The main goal of in- 
troducing weights into LHIP rules is to induce 
a partial order over the generated hypotheses. 
The following schema illustrates how to build a 
simple weighted rule in a compositional fashion 
where the resulting weight is computed from 
the sub-constituents using the minimum oper- 
ator. Weights are real numbers in the interval 
[0,1]. 

cat (cat (Hyp) , Weight) ~~> 
sub_catl(Hl,Wl) , 

• • • J 

sub_catn(Hn,Wn) , 
{app_list([Hl, . . . ,Hn] ,Hyp) , 
min_list( [Wl, . . . ,Wn] ,Weight)>. 

This strategy is not the only possible since 
the LHIP formalism allows a greater flexibil- 
ity. Without entering into formal details we 
can observe that if we strictly follow the above 
schema and we impose a cover threshold of 
1 we are dealing with fuzzy DCG grammars 
[T^ 131 . We actually extend this class of gram- 
mars with a notion of fuzzy-robustness where 
weights are used to compute confidence factors 
for the membership of islands to categories^. 
Note that this could be useful when we don't 
want to use deep parsing strategies and when 
our goal is to find semantic markers which al- 
low us to segment the sentence into coarse grain 
chunks. Furthermore the order of constituents 
may play an important role in assigning weights 
for different rules having the same number and 
type of constituents. Each LHIP rule returns 
a weight together with a term which will con- 
tribute to build the resulting parsing structure. 

^Development of this notion is currently under in- 
vestigation and not yet formalized. 



The confidence factor for a pre-terminal rule 
is assigned statically on the basis of the do- 
main knowledge which allows us to find seman- 
tic markers within the text. 

2.2 Robust semantic parsing 

In our case study we try to integrate the above 
principles in order to effectively compute hy- 
potheses for the query generation task. This 
can be done by building a query hypotheses lat- 
tice and selecting the best ones. The lattice of 
hypotheses is generated by means of a LHIP 
weighted grammar extracting what we called 
semantic chunks. At the end of this process we 
obtain suitable interpretations from which we 
are able to extract the content of the query. 
The rules are designed considering two kind 
of knowledge: domain knowledge is exploited 
to provide quantitative support (or confidence 
factor) to our rules; linguistic knowledge is used 
for determining constraints in order to prune 
the hypotheses space. We are concerned with 
lexical knowledge when we need to specify lex- 
ical LHIP rules which represent the building 
blocks of our parsing system. 

Semantic markers are domain-dependent word 
patterns and must be defined for a given cor- 
pus. They identify cue-phrases serving both 
as separators between two logical subparts of 
the same sentence and as anchors for semantic 
constituents. In our specific case they allow us 
to search for the content of the query only in in- 
teresting parts of the sentence. The generation 
of query hypotheses is performed by: compos- 
ing weighted rules, assembling semantic chunks 
and filtering possible hypotheses. 

Lexical knowledge: semantic markers 

As pointed out in lexical knowledge plays 
an important role in Information Extraction 
since it can contribute in guiding the analy- 
sis process at various linguistic level. In our 



case we are concerned with lexical knowledge 
when we need to specify lexical LHIP rules 
which represent the building blocks of our pars- 
ing system. Semantic markers are domain- 
dependent word patterns and must be defined 
for a given corpus. They identify cue-words 
serving both as separators among logical sub- 
parts of the same sentence and as introducers 
of semantic constituents. In our specific case 
they allow us to search for the content of the 
query only in interesting parts of the sentence. 
One of the most important separators is the 
announcement- query separator. 

Generation of hypotheses The generation 
of annotation hypotheses is performed by: 
composing weighted rules, assembling chunks 
and filtering possible hypotheses. In this case 
the grammar should provide a mean to pro- 
vide an empty constituent when all possible 
hypothesis rules have failed. This is possible 
using negation and epsilon-rules in LHIP. The 
highest level constituent is represented by the 
whole sentence structure which simply specifies 
the possible orders of chunks relative to anno- 
tation hypotheses. In the corresponding rules 
we have specified a possible order of chunks in- 
terleaved by semantic markers (e.g. separators 
and introducers). The computation of global 
weight may be complex. We simply used the 
minimum of each hypothesis confidence values. 

2.3 Filtering and completion 

The obtained frame hypotheses can be fur- 
ther filtered by both using structural knowl- 
edge (e.g. constraints imposed by the syntax 
analysis) and domain knowledge (e.g. an ontol- 
ogy like Wordnet). In order to combine the in- 
formation extracted from the previous analysis 
step into the final query representation which 
can be directly mapped into the database query 
language we make use of a frame structure in 



which slots represent information units or at- 
tributes in the database. We combine multiple 
theories representing domain knowledge in or- 
der to perform both consistency checking and 
the frame completion. A simple notion of con- 
text is used in order to fill by default those 
slots for which we have no explicit informa- 
tion. For doing this type of hierarchical rea- 
soning we exploit the meta-programming ca- 
pabilities of logic programming and we used a 
meta-interpreter which allows multiple inheri- 
tance among logical theories p]. 

3 Conclusions 

So far we have presented a robust speech under- 
standing system that is not far removed from 
many other systems. In particular, keyword 
spotting is a technique often used in restricted 
domains. Certainly, we go further by using 
weighting techniques on the grammar, employ- 
ing a logical intermediate representation, and 
performing inference on this intermediate rep- 
resentation. The question we now wish to ad- 
dress, is how can we move forward. Can this 
approach be generalized? What are the conse- 
quences of this approach? We will argue that 
this method fits into a general approach that we 
call a predictive dialogue modeling approach. 
First, however, it is necessary to mix in gen- 
eral remarks about the state of the art in di- 
alogue processing and the problems that must 
be addressed. The advancement from system 
directed queries to mixed strategies is an im- 
portant first stage in allowing for more natural 
interactive systems. Of course, a mixed ini- 
tiative approach typically generates higher er- 
ror rates. Reducing these error rates involves 
constraining dialogues which is typically done 
by restricting the domain of application of the 
system. Such an approach allows us to restrict 
the vocabulary to maybe a few hundred words 
instead of the thousands or hundreds of thou- 



sands of words that we would need in a more 
general case. An observation of human to hu- 
man communication shows a large number of 
phenomena which present particular problems 
for machine analysis. Interruptions, confirma- 
tions, anaphora, ellipsis as well as the breaks, 
repairs, pauses, and jumps normally found in 
human dialogue all present difficulties for ma- 
chine understanding. Robust processing goes 
a long way to handling certain of these prob- 
lems. We contend, however, that more general 
solutions can only come from having a model 
of the domain and of the user. The model of 
the user is not only necessary for better under- 
standing what the user is saying, but also for 
matching the expectations of the user in the in- 
teraction with the machine. This is necessary 
because it is difficult to communicate the sys- 
tem's capabilities to the user. The user does 
not necessarily know the vocabulary that the 
system's capable of handling, nor the type of 
questions that the system may answer. We can 
see then that a user model can be of great ben- 
efit in future natural interactive systems. In 
addition, in multi-modal interaction the user 
model will allow us to better tailor the use of 
different modalities to the user. More impor- 
tantly, from our point of view, such a model is 
part of a predictive approach to natural inter- 
activity. 

The idea of this approach is to continuously an- 
ticipate the interaction with the user. In other 
words, analysis should be based on the expecta- 
tions of the system. Such an approach allows 
us to restrict vocabulary, domain knowledge, 
and interaction types to only those necessary 
for the immediate understanding. In a sense 
dialogue grammars, finite state approaches to 
dialogue, and template approaches to dialogue 
are all predictive models. We anticipate an ap- 
proach in which more general models of lan- 
guage, based on the content of communication, 
are derived from knowledge of the domain, the 



user's knowledge of the domain, and the sys- 
tem's view of the user's needs, beliefs, goals 
and motivations. 

3.1 Related works 

As examples of robust approaches applied to 
dialogue systems we cite here two systems 
which are based on similar principles. In the 
DIALOGOS human-machine telephone system 
(see [T]) the robust behavior of the dialogue 
management module is based both on a con- 
textual knowledge base of pragmatic-based ex- 
pectations and the dialogue history. The sys- 
tem identifies discrepancies between expecta- 
tions and the actual user behavior and in that 
case it tries to rebuild the dialogue consistency. 
Since both the domain of discourse and the 
user's goals (e.g. railway timetable inquiry) 
are clear, it is assumed the systems and the 
users cooperate in achieving reciprocal under- 
standing. Under this underlying assumption 
the system pro-actively asks for the query pa- 
rameters and it is able to account for those 
spontaneously proposed by the user. 

In the SYSLID project jH] where a robust 
parser constitutes the linguistic component of 
the query- answering dialogue system. An ut- 
terance is analyzed while at the same time its 
semantical representation is constructed. This 
semantical representation is further analyzed 
by the dialogue control module which then 
builds the database query. Starting from a 
word graph generated by the speech recognizer 
module, the robust parser will produce a search 
path into the word graph. If no complete path 
can be found, the robust component of the 
parser, which is an island based chart parser 
[121, will select the maximal consistent par- 
tial results. In this case the parsing process 
is also guided by a lexical semantic knowledge 
base component that helps the parse in solving 
structural ambiguities. 



3.2 Future Work 

The limited resources of the project did not 
allow us to adequately evaluate the results 
and test the system against real situations. 
Nonetheless our final opinion about the ISIS 
project is that there are some promising direc- 
tions applying robust parsing techniques and 
integrating them with knowledge representa- 
tion and reasoning. Moreover we did not com- 
mit on the used architecture and we envision 
that better results can be achieved moving to- 
wards a distributed agent-based architecture 
for natural language processing. An ongoing 
project^ at our laboratory is concerned with 
these aspects, where we propose an hybrid dis- 
tributed architecture which combines symbolic 
and numerical computing by means of agents 
providing linguistic services. Within this archi- 
tecture also the knowledge management plays 
a central role and it is aimed to the intelligent 
coordination of the linguistic agents OS]- 
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